IN THE CLAIMS : 



Please substitute the following claims for the same-numbered claims in the application: 

1 . (Currently Amended) A system for extracting information comprising: 
a query input; 

a database of documents; 

a plurality of classifiers arranged in a hierarchical cascade of classifier layers, wherein 
each classifier comprises a set of weighted training data points comprising feature vectors 
representing any portion of a document , wherein each said feature vector is arranged only as a 
vec tor of counts for all features in a data point , and wherein said classifiers are operable to 
retrieve documents from said database matching based solely on whether said documents are 
relevant to said query input; and 

a terminal classifier weighing an output from said cascade according to a rate of success 
of query terms being matched by each layer of said cascade. 

2. (Original) The system of claim 1 , wherein each classifier accepts an input distribution of 
data points and transforms said input distribution to an output distribution of said data points. 

3. (Original) The system of claim 2, wherein each classifier is trained by weighing training 
data points at each classifier layer in said cascade by an output distribution generated by each 
previous classifier layer, and wherein weights of said training data points of said first classifier 
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layer are uniform. 



4. (Original) The system of claim 3, wherein each classifier is trained according to said 
query input. 

5. (Original) The system of claim 2, wherein said query input is based on a minimum 
number of example documents. 

6. (Original) The system of claim 1, wherein said document comprises data points 
comprising feature vectors representing any portion of said document. 

7. (Original) The system of claim 1 , wherein said documents comprise a file format capable 
of being represented by said feature vectors. 

8. (Original) The system of claim 1, wherein said documents comprise any of text files, 
images, web pages, video files, and audio files. 

9. (Original) The system of claim 2, wherein a classifier at each layer in said hierarchical 
cascade is trained for each layer with an expectation maximization methodology that maximizes 
a likelihood of a joint distribution of said training data points and latent variables. 

10. (Original) The system of claim 9, wherein each layer of said cascade of classifiers is 
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trained in succession from a previous layer by said expectation maximization methodology, 
wherein said output distribution is used as an input distribution for a succeeding layer. 

1 1 . (Original) The system of claim 9, wherein each layer of said cascade of classifiers is 
trained by successive iterations of said expectation maximization methodology until a 
convergence of parameter values associated with said output distribution of each layer occurs in 
succession. 

12. (Original) The system of claim 1 1, wherein said successive iterations comprise a fixed 
number of iterations. 

13. (Original) The system of claim 9, wherein all layers of said cascade of classifiers are 
trained by successive iterations of said expectation maximization methodology until a 
convergence of parameter values associated with output distributions of all layers occurs, 
wherein during each step of the of said iterations, the output distribution of each layer is used to 
weigh the input distribution of a succeeding layer. 

14. (Original) The system of claim 13, wherein said successive iterations comprise a fixed 
number of iterations. 

15. (Original) The system of claim 2, wherein each classifier layer generates a relevancy 
score associated with each a data point, wherein said relevancy score comprises an indication of 
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how closely matched said data point is to said example documents. 

1 6. (Original) The system of claim 2, wherein each classifier layer generates a relevancy 
score associated with said document, wherein said relevancy score is calculated from relevancy 
scores of individual data points within said document. 

17. (Original) The system of claim 5, wherein said terminal classifier generates a relevancy 
score associated with each data point, wherein said relevancy score comprises an indication of 
how closely matched said data point is to said example documents, and wherein said relevancy 
score is computed by combining relevancy scores generated by classifiers at each layer of the 
cascade. 

18. (Original) The system of claim 2, wherein said terminal classifier generates a relevancy 
score associated with a document, wherein said relevancy score is calculated from relevancy 
scores of individual data points within said document. 

19. (Original) The system of claim 1, wherein features of said feature vectors comprise words 
within a range of words located proximate to entities of interest in said document. 

20. (Currently Amended) A method of extracting information, said method comprising: 
inputting a query; 

searching a database of documents based on said query; 
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retrieving documents from said database matching based solely on whether said 
documents are relevant to said query using a plurality of classifiers arranged in a hierarchical 
cascade of classifier layers, wherein each classifier comprises a set of weighted training data 
points comprising feature vectors representing any portion of a document , wherein each said 
feature vector is arranged only as a vector of counts for all features in a data point ; and 

weighing an output from said cascade according to a rate of success of query terms being 
matched by each layer of said cascade, wherein said weighing is performed using a terminal 
classifier. 

21 . (Original) The method of claim 20, wherein each classifier accepts an input distribution 
of data points and transforms said input distribution to an output distribution of said data points. 

22. (Original) The method of claim 2 1 , wherein each classifier is trained by weighing 
training data points at each classifier layer in said cascade by an output distribution generated by 
each previous classifier layer, and wherein weights of said training data points of said first 
classifier layer are uniform. 

23. (Original) The method of claim 22, wherein each classifier is trained according to said 
query input. 

24. (Original) The method of claim 2 1 , wherein said query input is based on a minimum 
number of example documents. 
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25. (Original) The method of claim 20, wherein said document comprises data points 
comprising feature vectors representing any portion of said document. 

26. (Original) The method of claim 20, wherein said documents comprise a file format 
capable of being represented by said feature vectors. 

27. (Original) The method of claim 20, wherein said documents comprise any of text files, 
images, web pages, video files, and audio files. 

28. (Original) The method of claim 21, wherein a classifier at each layer in said hierarchical 
cascade is trained for each layer with an expectation maximization methodology that maximizes 
a likelihood of a joint distribution of said training data points and latent variables. 

29. (Original) The method of claim 28, wherein each layer of said cascade of classifiers is 
trained in succession from a previous layer by said expectation maximization methodology, 
wherein said output distribution is used as an input distribution for a succeeding layer. 

30. (Original) The method of claim 28, wherein each layer of said cascade of classifiers is 
trained by successive iterations of said expectation maximization methodology until a 
convergence of parameter values associated with said output distribution of each layer occurs in 
succession. 
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3 1 . (Original) The method of claim 30, wherein said successive iterations comprise a fixed 
number of iterations. 

32. (Original) The method of claim 28, wherein all layers of said cascade of classifiers are 
trained by successive iterations of said expectation maximization methodology until a 
convergence of parameter values associated with output distributions of all layers occurs, 
wherein during each step of the of said iterations, the output distribution of each layer is used to 
weigh the input distribution of a succeeding layer. 

33. (Original) The method of claim 32, wherein said successive iterations comprise a fixed 
number of iterations. 

34. (Original) The method of claim 21, wherein each classifier layer generates a relevancy 
score associated with each a data point, wherein said relevancy score comprises an indication of 
how closely matched said data point is to said example documents. 

35. (Original) The method of claim 21, wherein each classifier layer generates a relevancy 
score associated with said document, wherein said relevancy score is calculated from relevancy 
scores of individual data points within said document. 

36. (Original) The method of claim 24, wherein said terminal classifier generates a relevancy 
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score associated with each data point, wherein said relevancy score comprises an indication of 
how closely matched said data point is to said example documents, and wherein said relevancy 
score is computed by combining relevancy scores generated by classifiers at each layer of the 
cascade. 

37. (Original) The method of claim 21 , wherein said terminal classifier generates a relevancy 
score associated with a document, wherein said relevancy score is calculated from relevancy 
scores of individual data points within said document. 

38. (Original) The method of claim 20, wherein features of said feature vectors comprise 
words within a range of words located proximate to entities of interest in said document. 

39. (Currently Amended) A program storage device readable by computer, tangibly 
embodying a program of instructions executable by said computer to perform a program storage 
devic e method of extracting information, said program storag e d e vice method comprising: 

inputting a query; 

searching a database of documents based on said query; 

retrieving documents from said database matching based solely on whether said 
documents are relevant to said query using a plurality of classifiers arranged in a hierarchical 
cascade of classifier layers, wherein each classifier comprises a set of weighted training data 
points comprising feature vectors representing any portion of a document , wherein each said 
feature vector is arranged only as a vector of counts for all features in a data point ; and 
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weighing an output from said cascade according to a rate of success of query terms being 
matched by each layer of said cascade, wherein said weighing is performed using a terminal 
classifier. 

40. (Original) The program storage device of claim 19, wherein each classifier accepts an 
input distribution of data points and transforms said input distribution to an output distribution of 
said data points. 

41 . (Original) The program storage device of claim 40, wherein each classifier is trained by 
weighing training data points at each classifier layer in said cascade by an output distribution 
generated by each previous classifier layer, and wherein weights of said training data points of 
said first classifier layer are uniform. 

42. (Original) The program storage device of claim 41, wherein each classifier is trained 
according to said query input. 

43. (Original) The program storage device of claim 40, wherein said query input is based on 
a minimum number of example documents. 

44. (Original) The program storage device of claim 39, wherein said document comprises 
data points comprising feature vectors representing any portion of said document. 
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45. (Original) The program storage device of claim 39, wherein said documents comprise a 
file format capable of being represented by said feature vectors. 



46. (Original) The program storage device of claim 39, wherein said documents comprise 
any of text files, images, web pages, video files, and audio files. 

47. (Original) The program storage device of claim 40, wherein a classifier at each layer in 
said hierarchical cascade is trained for each layer with an expectation maximization methodology 
that maximizes a likelihood of a joint distribution of said training data points and latent variables. 

48. (Original) The program storage device of claim 47, wherein each layer of said cascade of 
classifiers is trained in succession from a previous layer by said expectation maximization 
methodology, wherein said output distribution is used as an input distribution for a succeeding 
layer. 

49. (Original) The program storage device of claim 47, wherein each layer of said cascade of 
classifiers is trained by successive iterations of said expectation maximization methodology until 
a convergence of parameter values associated with said output distribution of each layer occurs 
in succession. 

50. (Original) The program storage device of claim 49, wherein said successive iterations 
comprise a fixed number of iterations. 
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5 1 . (Original) The program storage device of claim 47, wherein all layers of said cascade of 
classifiers are trained by successive iterations of said expectation maximization methodology 
until a convergence of parameter values associated with output distributions of all layers occurs, 
wherein during each step of the of said iterations, the output distribution of each layer is used to 
weigh the input distribution of a succeeding layer. 

52. (Original) The program storage device of claim 5 1 , wherein said successive iterations 
comprise a fixed number of iterations. 

53. (Original) The program storage device of claim 40, wherein each classifier layer 
generates a relevancy score associated with each a data point, wherein said relevancy score 
comprises an indication of how closely matched said data point is to said example documents. 

54. (Original) The program storage device of claim 40, wherein each classifier layer 
generates a relevancy score associated with said document, wherein said relevancy score is 
calculated from relevancy scores of individual data points within said document. 

55. (Original) The program storage device of claim 43, wherein said terminal classifier 
generates a relevancy score associated with each data point, wherein said relevancy score 
comprises an indication of how closely matched said data point is to said example documents, 
and wherein said relevancy score is computed by combining relevancy scores generated by 
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classifiers at each layer of the cascade. 

56. (Original) The program storage device of claim 40, wherein said terminal classifier 
generates a relevancy score associated with a document, wherein said relevancy score is 
calculated from relevancy scores of individual data points within said document. 

57. (Original) The program storage device of claim 39, wherein features of said feature 
vectors comprise words within a range of words located proximate to entities of interest in said 
document. 
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